Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto eval actually works #310

Merged
merged 15 commits into from
Aug 30, 2024
Merged

Auto eval actually works #310

merged 15 commits into from
Aug 30, 2024

Conversation

vwxyzjn
Copy link
Collaborator

@vwxyzjn vwxyzjn commented Aug 29, 2024

So auto eval that actually works (also with oe-eval). The idea is to do something like the following

10:30 - now testing with docker containers.

    start_time = time.time()
    while time.time() - start_time < args.max_wait_time_for_beaker_dataset_upload_seconds:
        if beaker_experiment_succeeded(beaker_runtime_config.beaker_workload_id):
            print("Experiment succeeded")
            # NOTE: we are assuming the first beaker dataset has the model
            # I have checked a couple of beaker jobs and found the first dataset is the model
            # but we should check this assumption
            submit_beaker_eval_jobs(
                model_name=args.model_name,
                location=beaker_dataset_ids[0],
                run_oe_eval_experiments=True,
            )
            return
        time.sleep(args.check_interval_seconds)
    # If we reach here, the experiment failed
    print("Experiment failed")
    sys.exit(1)  # submit eval failed
image

docker submission

mason submission

open_instruct/utils.py Show resolved Hide resolved
Comment on lines +35 to +37
# NOTE: we are assuming the first beaker dataset has the model
# I have checked a couple of beaker jobs and found the first dataset is the model
# but we should check this assumption
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yizhongw @jacob-morrison for multi-node job I assumed the first beaker dataset will have the model. Is this always the case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked my experiment list. It seems so. I didn't see an exception.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my one concern is if this is the same when preemption happens, but thats sort of an edge case so happy to merge and then patch later.

@vwxyzjn vwxyzjn marked this pull request as ready for review August 29, 2024 17:17
@@ -704,7 +763,7 @@ def upload_metadata_to_hf(
# about a model for leaderboard displays.
with open("tmp.json", "w") as f:
json.dump(metadata_dict, f)
api = HfApi(token=os.getenv("HF_TOKEN", None))
api = HfApi()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

HF_TOKEN already overrides, so there is no need to pass explicitly

@@ -1001,49 +1001,65 @@ def main(args: FlatArguments):
)

# remove all checkpoints to save space
if accelerator.is_main_process:
if accelerator.is_local_main_process:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should remove the intermedaite checkpoints on all the nodes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@vwxyzjn vwxyzjn mentioned this pull request Aug 29, 2024
Copy link
Collaborator

@natolambert natolambert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems good

@vwxyzjn vwxyzjn merged commit 4c0e9e8 into main Aug 30, 2024
3 checks passed
Comment on lines +35 to +37
# NOTE: we are assuming the first beaker dataset has the model
# I have checked a couple of beaker jobs and found the first dataset is the model
# but we should check this assumption
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my one concern is if this is the same when preemption happens, but thats sort of an edge case so happy to merge and then patch later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants